Skip to content

pwmgen: prevent GCC from emitting SSE in base-thread function#3896

Closed
grandixximo wants to merge 1 commit intoLinuxCNC:masterfrom
grandixximo:fix/pwmgen-sse-base-thread
Closed

pwmgen: prevent GCC from emitting SSE in base-thread function#3896
grandixximo wants to merge 1 commit intoLinuxCNC:masterfrom
grandixximo:fix/pwmgen-sse-base-thread

Conversation

@grandixximo
Copy link
Copy Markdown

make_pulses() is exported with uses_fp=0 for the base-thread, but GCC optimizes struct zeroing into SSE instructions (pxor/movups on XMM registers). Since RTAI skips FPU/SSE save/restore for non-FP threads, this silently corrupts XMM state of whatever Linux process was running, causing heap corruption, segfaults, and hard crashes.

Add attribute((target("general-regs-only"))) to make_pulses() to force GCC to use only general-purpose registers.

Confirmed via objdump that the compiled function contains zero SSE instructions after this change. Tested on RTAI 5.4.280 with the etch-servo parport config which previously crashed reliably.

See: #3895

make_pulses() is exported with uses_fp=0 for the base-thread, but GCC
optimizes struct zeroing into SSE instructions (pxor/movups on XMM
registers). Since RTAI skips FPU/SSE save/restore for non-FP threads,
this silently corrupts XMM state of whatever Linux process was running,
causing heap corruption, segfaults, and hard crashes.

Add __attribute__((target("general-regs-only"))) to make_pulses() to
force GCC to use only general-purpose registers.

Confirmed via objdump that the compiled function contains zero SSE
instructions after this change. Tested on RTAI 5.4.280 with the
etch-servo parport config which previously crashed reliably.

See: LinuxCNC#3895
@andypugh
Copy link
Copy Markdown
Collaborator

andypugh commented Apr 3, 2026

Is this fix still needed, or are we going for the meta-fix?

@grandixximo
Copy link
Copy Markdown
Author

I can work on ripping out non-FP in the weekend, this will not be needed once that is done, it is just a quick fix, if anyone wants to run RTAI with pwmgen in the meantime, Though I doubt there is anyone else but me testing this.

@grandixximo grandixximo closed this Apr 7, 2026
@grandixximo
Copy link
Copy Markdown
Author

superseded by #3901

@andypugh
Copy link
Copy Markdown
Collaborator

andypugh commented Apr 7, 2026

I was about to ask for clarification about this, how do these three PRs fit together?

@grandixximo
Copy link
Copy Markdown
Author

grandixximo commented Apr 7, 2026

Oh, this forcibly flagged the pwm step making function to prevented the compiler to use floating point operations in the compiled assembly. The problem was the PWM step making function is usually added to base-thread which has uses_fp=0, RTAI would skip operations for uses_fp=0 because they are not supposed to be there for performance (which is not needed anymore in todays hardware), but compiler puts them in there even if unnecessary (performance landscape changed with new hardware). But since we decided (wisely) to deprecate uses_fp we don't need to check for this anymore, we can happily compile safely, and if the compiler thinks FP operations are faster it can compile them, and we don't crash since everything is uses_fp=1 with #3901 without breaking API. In the near future uses_fp should be completely ripped out of the API with another PR, which I can draft if you want. But this will break the API, so we are waiting for all the other stuff that does break the HAL API as well to come along aka long_int with getter\setter and noparam

@grandixximo grandixximo deleted the fix/pwmgen-sse-base-thread branch April 7, 2026 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants